Medical Physics — Latest Matching Preprints

1

Comparative Study on Image Quality of Deep Learning and Adaptive Statistical Iterative Reconstruction-V in Thin Layer CT of liver Lesions

Yang, J.; Li, L.; Cao, J.; Zhang, J.

2026-05-26 radiology and imaging 10.64898/2026.05.23.26353923 medRxiv

Top 0.1%

4.8%

Show abstract

Objective:This study aims to compare the advantages and disadvantages of DLIR and adaptive statistical iterative reconstruction-V (ASIR-V) in thin-slice (2.5 mm) CT images of hepatic lesions characterized by high and low contrast. Additionally, the study seeks to determine the optimal DLIR strength for the evaluation of liver lesions. Methods:A retrospective analysis was performed on 90 patients who underwent abdominal contrast-enhanced CT scans. Group A comprised 48 patients with low-contrast lesions, while Group B included 42 patients with high-contrast lesions. The acquired images were reconstructed using post-processing DLIR at low (DLIR-L), medium (DLIR-M), and high (DLIR-H) strengths, all with a slice thickness of 2.5 mm (subgroups A1-A3, B1-B3). Furthermore, images were reconstructed with ASIR-V at 50% strength at slice thicknesses of 2.5 mm and 5 mm (subgroups A4/B4 and A5/B5, respectively). CT values and standard deviations (SD) of the liver and lesions were measured, and the corresponding signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) were calculated. The edge rise slope (ERS) was determined using ImageJ software by measuring CT values along a line from the liver parenchyma to the lesion. Objective metrics were compared using one-way ANOVA, with independent samples t-tests applied for inter-group differences. Subjective scoring, which encompassed noise level, diagnostic confidence, and lesion margin delineation, was conducted by two radiologists, with differences analyzed using the Kappa test. Results: Objective evaluation revealed a progressive decrease in lesion SD and a progressive increase in SNR and CNR from subgroups A1/B1 to A3/B3. The SD of Group A2 decreased by 57.4% compared to A4, while the SNR and CNR of A2 icreased by 19.3% and 24.6% compared to A4. Although subgroup B2 had a lower SNR than B5, the difference was not statistically significant. SNR and CNR in B2 increased by 24.1% and 11.9%, respectively, compared to B4. ERS gradually decreased from A1/B1 to A3/B3. ERS values in A2 and B2 increased by 27.0% and 39.4%, respectively, relative to A5 and B5. Although A3 had a lower ERS than A1 and A2, all DLIR subgroups exhibited higher ERS than A5; similar trends were observed in Group B. Subjective evaluation indicated good inter-reader agreement (Kappa > 0.61, p < 0.05). As DLIR strength increased, noise scores rose progressively in both groups. However, noise in A2 and B2 was lower than in A4/A5 and B4/B5. Diagnostic confidence and lesion margin delineation scores were highest in A2 and B2, while all subjective scores were lowest in A5 and B5. Discussion: Most prior studies evaluated the liver, vessels, or confirmed that image quality can be guaranteed at low doses. However, there are few studies on specific individual lesions. Therefore, this study aims to investigate specific individual lesions. The details and detection rate were analyzed separately to confirm the clinical acceptability of 2.5-mm DLIR image in different contrast lesions. Conclusion: For both high- and low-contrast hepatic lesions, DLIR provides superior image quality compared to ASIR-V, with the 2.5mm DLIR-M setting being optimal. DLIR-M reduces image noise, improves spatial resolution, and produces images more suitable for diagnostic purposes.

2

Using artificial intelligence for radiotherapy clinical trial quality assurance: analysis of a multi-institutional clinical trial for neurovascular-sparing prostate stereotactic ablative radiotherapy

Doucette, M.; Zhang, Y.; Liao, C.-Y.; Lin, M.-H.; Yan, Y.; Dess, R. T.; Tendulkar, R. D.; Garant, A.; Hannan, R.; Jiang, S.; Nguyen, D.; Desai, N.; Yang, D. X.

2026-05-29 health informatics 10.64898/2026.05.27.26354252 medRxiv

Top 0.2%

3.6%

Show abstract

Our study evaluated whether a deep learning auto segmentation model combined with machine learning triage can streamline radiotherapy clinical trial quality assurance (QA). We analyzed 107 stereotactic ablative radiotherapy (SABR) cases from a multi-institutional phase II clinical trial of neurovascular sparing prostate SABR, focusing on physician contours of the internal pudendal artery (IPA) as a novel organ-at-risk with substantial interobserver variability. Contours were scored by the trial principal investigator as Per-Protocol or Minor Deviation/Unacceptable. We applied a deep learning model for IPA auto-segmentation. Agreement between human and AI contours was then quantified using 14 overlap, distance, and surface metrics, and a supervised classifier was trained on these metrics to flag clinical trial protocol deviations. While AI segmentation achieved only modest geometric accuracy with mean Dice similarity coefficient of 0.446 and 95th percentile Hausdorff distance of 14.23, when incorporating all 14 metrics, a machine learning classifier yielded AUROC of 0.836, flagging all Minor Deviation/Unacceptable cases with 100% sensitivity on the 27 case hold-out set with 6 false positives and no false negatives. AI segmentation combined with metrics-based machine learning can triage protocol deviations within a multi-institution radiotherapy clinical trial, supporting prospective evaluation of AI-assisted trial QA.

3

Weight-Guided Constraints for Body Model and Lead Selection in Pediatric CIED MRI Safety Simulations

Hameed, S.; Henry, K.; Jiang, F.; Bhusal, B.; Dillenbeck, H.; Gakenheimer-Smith, L.; Webster, G.; Golestani Rad, L.

2026-05-30 radiology and imaging 10.64898/2026.05.26.26354162 medRxiv

Top 0.2%

3.6%

Show abstract

Pediatric patients with cardiac implantable electronic devices (CIEDs) face limited MRI access due to RF-induced heating, and computational modeling is increasingly used to characterize this risk. The validity of these simulations, however, depends on pairing body models with clinically realistic lead configurations, guidance that is currently lacking. We retrospectively analyzed 302 CIED surgeries in 281 pediatric patients to derive weight-based constraints for simulation design. Weight alone discriminated epicardial from endocardial lead implantation with AUC = 0.90, and adding age and height yielded no improvement, supporting weight as a sufficient single-parameter selection metric. The probabilistic crossover between approaches occurred at 44~kg, substantially higher than the 10 to 15~kg threshold commonly cited in the literature, with a broad transition zone of 21 to 66~kg in which both lead types were routinely used. Lead length was likewise weight-constrained: only 25~cm leads were observed in patients below 6~kg, and leads of 45~cm or longer were uncommon below 50~kg. These findings yield a three-tier framework, with epicardial-only configurations below 21~kg, dual configurations within 21 to 66~kg, and weight-thresholded lead lengths throughout, enabling MRI safety simulations to focus on clinically realizable anatomy and device combinations.

4

DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv

Top 0.2%

2.9%

Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

5

Within-Patient Comparison of Ga-PSMA-11 PET/CT in Prostate Cancer: Protocol-Conditional Biodistribution and Quantitative Non-Interchangeability

Kwon, W.-A.; Park, S.; Kim, R.; Lee, W.; Park, C.; Kim, T.-S.; Joung, J. Y.

2026-05-30 radiology and imaging 10.64898/2026.05.28.26354302 medRxiv

Top 0.3%

2.1%

Show abstract

Background: Prostate-specific membrane antigen (PSMA) PET/CT is central to prostate cancer staging and theranostic workflows. To our knowledge, no direct within-patient comparison of [18F]FC303 ([18F]Florastamin) and [68Ga]Ga-PSMA-11 has been reported. We performed a preliminary paired method-comparison study under non-harmonized acquisition protocols. Patients and Methods: Twenty patients with histologically confirmed prostate cancer underwent [68Ga]Ga-PSMA-11 PET/CT (185 +/- 37 MBq, 60 +/- 10 min) followed by [18F]FC303 PET/CT (370 +/- 37 MBq, 105 +/- 15 min) on the same PET/CT system within each patient (median interval, 29.5 days). Index targets were anatomically matched to the biopsied or surgically sampled lesion or target region. The primary malignant set included 18 histologically malignant targets; two histology-negative or indeterminate targets were included only in sensitivity analysis. Fixed [68Ga]Ga-PSMA-11-first scan order and the 45-min uptake-time difference were central interpretive constraints. Results: Across five predefined reference organs, [18F]FC303 showed lower SUVmean than [68Ga]Ga-PSMA-11 (all Benjamini-Hochberg-adjusted p < 0.001; [68Ga]/[18F]FC303 geometric mean ratio [GMR], 1.29-3.89). In the primary malignant set, [18F]FC303 lesion SUVmax was lower than [68Ga]Ga-PSMA-11 (median, 11.3 vs 18.1; paired median difference, -5.50; 95% CI, -6.85 to -2.90; Wilcoxon p = 8.4 x 10-4), with strong rank correlation (Spearman {rho} = 0.90). Passing-Bablok regression yielded {beta} = 1.13 (95% CI, 1.04-1.45), and log-Bland-Altman GMR (FC303/[68Ga]) was 0.75, consistent with proportional non-interchangeability. Tumor-to-liver and tumor-to-mediastinum ratios did not differ significantly (GMR, 1.17 [95% CI, 0.94-1.45] and 0.96 [0.80-1.15], respectively); the study was not powered for equivalence. The n = 20 sensitivity analysis showed consistent directionality. Conclusions: Under non-harmonized acquisition conditions, [18F]FC303 showed lower physiologic reference-organ SUVmean and malignant target-region SUVmax than [68Ga]Ga-PSMA-11, whereas tumor-to-liver and tumor-to-mediastinum ratios were not significantly different. Absolute SUVs were not interchangeable; [68Ga]Ga-PSMA-11-derived SUV thresholds should not be directly transferred to [18F]FC303 without tracer-specific calibration.

6

Voxel-wise temporal decomposition of hypoxia-targeted BOLD MRI: method development and proof-of-concept application in glioblastoma

Schmidlechner, T.; Stumpo, V.; Jehli, E.; Zerweck, L.; Bellomo, J.; Gönel, M.; Müller, F.; Sebök, M.; Bink, A.; Kulcsar, Z.; Weller, M.; Regli, L.; Fierstra, J.; van Niftrik, C. H. B.

2026-05-29 radiology and imaging 10.64898/2026.05.27.26354265 medRxiv

Top 0.5%

0.9%

Show abstract

Hypoxia-targeted BOLD MRI is a novel technique, which probes oxygenation physiology in response to a controlled transient hypoxia stimulus. In glioblastoma, the signal response is spatially and temporally heterogeneous. We developed a voxel-wise temporal decomposition framework for hypoxia-targeted BOLD MRI that separates the arrival of responses, transition phases, and steady state during controlled isocapnic hypoxia. Twenty healthy controls underwent 3-T BOLD MRI during a double hypoxic step challenge to establish a normative reference. Three patients with newly diagnosed glioblastoma were included as proof-of-concept cases. For each voxel, we estimated response arrival delay (Delaycorr), delay to plateau, delay to return and an O2-normalized steady-state response (HypoxiaSS). Healthy-control maps were used to construct a voxel-wise normative atlas and, for HypoxiaSS, a global-response-adjusted model for patient deviation mapping. In healthy controls, HypoxiaSS showed lower supratentorial between-subject variabilitythan both whole-stimulus comparators (coefficient of variation: 1.77 versus 2.36 for Hypoxiaavg) and higher voxel-level step-to-step agreement (ICC(2,1): median 0.951 versus 0.792 for Hypoxiaavg). Whole-stimulus averaging exhibited a systematic step-2 signal amplification present in 19 of 20 subjects, which was absent from HypoxiaSS. Asingle global response scalar explained a median 72.5% of voxel-wise between-subject variance in HypoxiaSS. In proof-of-concept patient analyses, G-adjusted HypoxiaSS deviation maps and timing maps identified spatially coherentabnormalities that were partly complementary and extended beyond conventional MRI-defined lesion margins.Temporal decomposition improves the stability and interpretability of hypoxia-targeted BOLD MRI and provides a practical framework for population-referenced physiological mapping and atlas-based deviation mapping in glioblastoma.

7

Automated Segmentation of Cerebral Arteries on Three-Dimensional Rotational Angiography Using nnUNet v2: Prospective Validation with Quantitative Metrics and Expert Qualitative Assessment

Hofmeister, J.; Brina, O.; Rosi, A.; Bernava, G.; Reymond, P.; Muster, M.; Lovblad, K.-O.; Machi, P.

2026-05-26 radiology and imaging 10.64898/2026.05.20.26353640 medRxiv

Top 0.5%

0.9%

Show abstract

Background: Three-dimensional visualization and quantitative analysis of cerebral arteries on 3DRA are central to endovascular treatment planning, device selection, and cerebrovascular research. Manual segmentation is time-consuming and operator-dependent, yet no open-source deep learning model has been prospectively validated for this task on 3DRA. Methods: A nnUNet v2 model was trained for binary cerebral artery segmentation on 400 consecutive 3DRA acquisitions from three angiographic systems, comparing four configurations across architectures and loss functions. The best-performing configurations were prospectively validated on 40 patients using a dual approach: quantitative metrics (DSC, clDice, HD95, ASD, Precision, Recall), and blinded expert qualitative evaluation by two interventional neuroradiologists assessing 12 arterial segments, a global quality score, and clinical usability across 40 test cases. Results: The ensemble model achieved median DSC 0.917, clDice 0.932, and HD95 1.494 mm. Global quality scores were significantly lower for nnUNet v2 than for expert segmentations (median 4 vs 5, p<0.001), but nnUNet v2 segmentations were rated clinically usable in 88-90% of cases versus 95-98% for expert segmentations, without significant difference on the binary usability criterion. A consistent proximal-to-distal quality gradient was identified, with comparable scores at proximal arteries and the largest differences at distal arterial segments. Conclusion: nnUNet v2 with topology-aware training provides clinically usable cerebral artery segmentations on 3DRA, prospectively validated through both quantitative metrics and structured expert qualitative assessment, and represents a reproducible open-source foundation for endovascular and research applications.

8

PIE Toolbox: SSM-PCA Based Software for PET Diagnostic Pattern Analysis

Romanov, M.; Kireev, M.; Didur, M.; Cherednichenko, D.; Korotkov, A.; Valdes-Sosa, P.; Fan, Q.; Wang, Q.

2026-06-01 radiology and imaging 10.64898/2026.05.28.26354341 medRxiv

Top 0.5%

0.9%

Show abstract

One of the prominent methods in neuroimaging data processing is SSM-PCA, which is based on principal component analysis and allows for the identification of diagnostically significant patterns in the form of statistical maps. We developed software, PIE Toolbox, employs SSM-PCA and classification based on the obtained diagnostic patterns revealed from functional and structural tomographic brain imaging. The program supports the entire analysis pipeline including preprocessing of brain images, diagnostic patterns extraction, building classification models, and prediction based on them. The resulting diagnostic patterns are weighted principal components obtained through SSM-PCA, or their linear combinations. PIE Toolbox allows selection of relevant structural and functional brain patterns, computation of their expression values in regions of interest, classification using support vector machines, and evaluation of model performance via cross-validation. This approach enables the use of patterns as features of intergroup differences for individual diagnosis. The software has been validated on both simulated and ADNI datasets.

9

TopBrain Segmentation Challenge for Whole Brain Vessel Anatomy

Yang, K.; Shi, P.; Huang, H.; Musio, F.; Baazaoui, H.; Aydin, O. U.; Hilbert, A.; Hamadache, R. E.; Yalcin, C.; Zhang, M.; Falcetta, D.; de la Rosa, E.; Shit, S.; Prabhakar, C.; Wittmann, B.; Rokuss, M. R.; Kirchhoff, Y.; Al-Maskari, R.; Hoeher, L.; Juchler, N.; Casamitjana, A.; Cleary, J.; Schmick, A.; Baumgartner, P.; Deseoe, J.; Vandans, O.; Lee, D.; Oh, K.; LaBella, D.; Mazher, M.; Niederer, S. A.; Qayyum, A.; Liu, Y.; Chen, J.; Kim, W.; Asawalertsak, N.; Kim, M.; Shin, D.; Park, S.-H.; Kikuchi, S.; Zhang, Y.; Liu, J.; Cui, Y.; Qiu, Y.; Verschuur, A.; Zhang, J.; van der Schaaf, I.; Su, R.;

2026-05-30 radiology and imaging 10.64898/2026.05.28.26354312 medRxiv

Top 0.6%

0.7%

Show abstract

We present the TopBrain 2025 Challenge, the first benchmark for fine-grained multiclass segmentation of the whole brain vasculature in both computed tomography angiography (CTA) and magnetic resonance angiography (MRA). Building on the TopCoW challenge, TopBrain scales vessel annotation from the Circle of Willis to the entire brain, introducing a dataset of 90 annotated volumes across 48 landmark vessel classes spanning arterial and venous systems, of which 50 training volumes are publicly released. Vessel definitions were consolidated from established neuroanatomical references into a unified annotation scheme, and vessel caliber measurements along the centerline are reported for the first time across the whole brain vascular anatomy. To address the unique challenges of multiclass brain vessel segmentation, we propose an evaluation framework that accounts for detection in segmentation performance, assesses anatomical plausibility, and introduces novel contamination metrics that characterize inter-class prediction errors. Fifteen teams from over 220 registered participants submitted algorithms to the benchmark. The top-performing teams built on nnUNet with principled system design choices, achieving around 80% Dice scores, near-zero invalid neighbor counts, over 60% F1 scores for side-road vessels, and below 18% foreground contamination ratio. Larger vessels are easier to segment, while smaller and more complex vessels remain the true bottleneck. The annotated datasets and podium-finish algorithms are made publicly available on Zenodo.

10

Survival and neurologic outcomes after re-irradiation in children with diffuse midline glioma and diffuse intrinsic pontine glioma

Vaziri, T.; Vyas, D.; Alhumaid, M.; Lucas, C.-H.; Guryildirim, M.; Kilburn, L.; Gartrell, R. D.; Koldobskiy, M. A.; Raabe, E.; Cohen, K.; Ladra, M.; Acharya, S.

2026-06-01 oncology 10.64898/2026.05.29.26354429 medRxiv

Top 1%

0.3%

Show abstract

Background: Reirradiation (reRT) is increasingly offered following progression in diffuse intrinsic pontine glioma (DIPG) and diffuse midline glioma (DMG), though optimal patient selection remains a challenge. This study evaluated clinical outcomes after reRT in a contemporary cohort of patients with DIPG/DMG. Methods: Patients <26 years old with DMG/DIPG treated with radiation therapy between 2011-2025 were retrospectively reviewed. Primary endpoints included overall survival (OS2) and progression-free survival (PFS2), measured from first progression, and change in neurologic symptoms after reRT. Survival was estimated using Kaplan Meier methods, with Cox proportional hazards modeling for prognostic factors. Results: Fifty eight patients were included; 37 (63.8%) underwent reRT. Tumors were predominantly pontine (74.1%). ReRT was associated with improvement in motor function (51.4% vs. 9.5%, p=0.002), cranial nerve function (29.7% vs. 4.8%, p=0.044), and gait ataxia (35.1% vs. 9.5%, p=0.059). Median OS2 and PFS2 were improved with reRT (OS2: 9.67 vs. 2.57 months, p<0.001; PFS2: 5.63 vs. 1.57 months, p<0.001). OS2 was independently associated with reRT (HR 0.27, p<0.0001), pontine location (HR 2.94, p=0.004), and steroid use at progression (HR 4.12, p=0.001). PFS2 was independently associated with reRT (HR 0.23, p < .0001) and distant pattern of failure (HR 2.83, p=.037). Among reRT patients, non-pontine location was associated with improved OS2 (p=0.02), and local failure was associated with improved PFS2 (p=0.003). Conclusion: ReRT was associated with neurologic improvement and prolonged survival. Patients with non-pontine tumors or local-only failure might derive the greatest benefit. Prospective studies are warranted to define optimal dose/fractionation and refine patient selection.

11

From CCTA to Surgical Strategy: An Integrated AI Framework for Patient-Specific Coronary artery bypass grafting Planning

Rezaeitaleshmahalleh, M.; Masoumi, S.; Debalme, E.; Sundt, T. M.; Aranki, S. F.; Shin, B.; Nezami, F. R.

2026-06-01 cardiovascular medicine 10.64898/2026.05.28.26354400 medRxiv

Top 1%

0.3%

Show abstract

Background: Coronary artery bypass grafting (CABG) remains the standard of care for complex multivessel and left main coronary artery disease. However, current preoperative planning remains largely subjective, relying on qualitative interpretation of coronary CT angiography (CCTA), operator-dependent stenosis grading, and fragmented multi-software workflows. Invasive fractional flow reserve (FFR), the reference standard for physiologic lesion assessment, is infrequently acquired preoperatively, leaving distal anastomosis planning without an objective hemodynamic basis. Methods: We developed a fully automated, AI-powered platform that converts routine CCTA into a patient-specific CABG planning workflow through five integrated modules: nnU-Net based segmentation of coronary lumen and calcification; quantitative morphological and topological characterization generating more than thirty descriptors; automated stenosis detection using a local reference-radius formulation; a nine-point composite scoring framework for distal anastomosis site selection incorporating luminal caliber, landing-zone length, calcification burden, distal perfusion reserve, and bifurcation proximity; and interactive virtual graft construction coupled to a distributed reduced-order solver for pre- and post-bypass FFR estimation. Results: Lumen segmentation achieved a mean Dice similarity coefficient of 0.96 {+/-} 0.01, whereas calcium segmentation achieved 0.73 {+/-} 0.15 on the held-out cohort. Platform-derived FFR demonstrated strong agreement with invasively measured FFR (r=0.96, mean absolute relative difference 1.73 {+/-}1.42%) across the evaluated lesions, supporting the physiologic validity of the reduced-order hemodynamic solver. End-to-end analysis from raw CCTA to hemodynamic assessment and virtual graft planning was completed in approximately seven minutes per case on a standard workstation, representing a substantial reduction in processing time compared with conventional multi-tool and CFD-based workflows. Conclusions: The proposed platform demonstrates the feasibility of rapid, reproducible, and physiology-informed CABG planning using routine CCTA. By integrating anatomical characterization, automated target-site analysis, virtual graft construction, and reduced-order hemodynamic assessment into a single workflow, the framework provides objective, quantitative surgical decision support compatible with routine clinical workflows. Keywords: Coronary artery bypass grafting (CABG); Fractional flow reserve (FFR); Coronary CT angiography (CCTA); Surgical planning

12

Left Ventricular Volume and Function Assessment Using a Reduced-Slice Approach in Cardiovascular Magnetic Resonance

Tejaswi, A.; Fyrdahl, A.; Sigfridsson, A.

2026-06-01 cardiovascular medicine 10.64898/2026.05.29.26354413 medRxiv

Top 1%

0.2%

Show abstract

Background: Cardiovascular magnetic resonance (CMR) quantification of the left ventricular (LV) volumes and ejection fraction (EF) typically involves manual segmentation of many short axis (SAx) and long axis (LAx) slices of the left ventricle. The scan time and the number of breath holds is proportional to the number of slices. We aimed to evaluate a geometric model of the left ventricle that could enable planimetry from a reduced number of slices. We sought to determine whether acceptable accuracy was retained for evaluating the End Diastolic Volume (EDV), End Systolic Volume (ESV), Stroke Volume (SV), and EF to provide a rapid and reliable clinical alternative. Methods: A cohort of 342 patients, median age: 54 (40 - 65) years, with full-stack CMR examinations was used. Nine geometrical combinations were evaluated: 3, 4 or 5 short axis slices and one of three LAx orientations (2-chamber, 3-chamber or 4-chamber) by retrospectively decimating the full-stack acquisition. LV volumes were calculated as a sum of trapezoidal approximations for apical and mid-cavity slices and a generalized prismoidal model at the base. The accuracy of the volume calculations was quantified against the full-stack reference for the EDV, ESV, SV, and EF using concordance correlation coefficient (CCC), two-way repeated measures ANOVA, pairwise tests, and Bayes factor log10(BF10) analysis. Results: The choice of the long axis (LAx) view was the most influential driver of accuracy (g2 = 0.104, for EDV), approximately 50 times more impactful than the number of SAx slices (g2 = 0.002, for EDV). Volumes calculated using the combination of 2-chamber LAx view and 5 SAx slices had the highest concordance with the full stack (CCC>0.90). While the estimated absolute volumes displayed a systematic negative bias, EF and SV remained highly robust due to bias cancellation. For a 2ch + 5 SAx protocol, EF bias was just 0.83% (LoA: -6.18 to 7.84%), with a minimum detectable change (MDC) of 7.01%, compared to 8.7% reported for expert human readers, suggesting strong concordance. Bayesian paired-samples t-tests yielded log10(BF10) = 6.42 in favor of 5 SAx over 3 SAx, constituting decisive evidence on the Jeffreys scale. The bias and limits of agreement (LoA) for stroke volume and ejection fraction were found to be lower than scan-rescan reproducibility in literature. Conclusion: This reduced-slice geometric model allows for reduced number of breath holds compared to a conventional full-stack CMR acquisition and provides an acceptable accuracy with bias less than scan-rescan variability.

13

Automated quantification of cerebral microbleeds for ARIA-H monitoring in Aging and Alzheimer's Disease: A multicenter deep learning validation

Low, Z. X. B.; Rowsthorn, E.; Nazem-Zadeh, M.-R.; Francis, M.; Robb, C.; Howcroft, M.; Whiriskey, R.; Brodtmann, A.; McNeil, J. J.; Law, M.

2026-05-26 radiology and imaging 10.64898/2026.05.19.26353364 medRxiv

Top 1%

0.2%

Show abstract

We trained a self-configuring nnU-Net model for CMB segmentation in a heterogeneous multicenter sample (n=264), including 1.5T and 3T field strengths, SWI and T2*-GRE sequences, and community and clinical cohorts. Model performance was evaluated using 5-fold cross-validation with a focus on object-level detection metrics. Real-world performance was evaluated on scans from an unseen dataset of people with cerebrovascular disease (n=20). The model achieved 0.82 cluster Dice, 0.88 precision, and 0.77 sensitivity on hold-out test data. Notably, the model demonstrated a low false-positive rate, averaging 0.58 false positives (FPs) per scan, an improvement on existing publicly available models. The model achieved high performance in dataset of those with Alzheimer's disease and mild cognitive impairment (0.89 cluster Dice, 0.94 sensitivity), supporting its utility in clinical settings where ARIA-H monitoring is critical. In external validation, the model maintained high robustness with 0.79 sensitivity and 0.95 FPs per scan. By leveraging a heterogenous training strategy and a self-adapting architecture, we demonstrate that deep learning can achieve high-precision CMB detection that is robust to domain shifts. The low FP rate suggests this publicly available pipeline is suitable for automated screening and lesion counting in heterogenous large-scale clinical trials, reducing the burden of manual quantification.

14

Application of SinoPlan in Trajectory Planning for Robot-Assisted Intracerebral Hematoma Puncture

Zhang, F. y.; Yao, J.; Zhou, Q. y.; fang, Y. c.; Hu, A.; Wang, Y.; Ding, W.; Wu, X.; Gu, Y.

2026-05-27 surgery 10.64898/2026.05.24.26353998 medRxiv

Top 1%

0.1%

Show abstract

Robot-assisted hematoma puncture has seen significant development in primary hospitals across the country. Sino Plan software system is the core of the intelligent surgical robot, independently developed by Sinovation.We conducted a comparative study of imaging indicators, such as residual hematoma volume and hematoma clearance rate, as well as prognostic indicators, in patients who underwent hematoma puncture at our hospital over a 9-year period, before and after the introduction of Sino Plan.The results indicated that following the application of Sino Plan, the hematoma clearance rate was significantly enhanced, and the residual hematoma volume was markedly reduced. Regarding patient prognosis, there was no significant difference in GCS scores between the two groups, but the incidence of adverse prognostic events was lower in patients where Sino Plan was utilized.In conclusion, this 9-year retrospective analysis at our hospital reveals that Sino Plan offers distinct advantages. However, its application in certain special cases suggests that further improvements to the software are warranted to better meet the demands of more specific clinical scenarios.

15

Assessing Lipid Core Burden Index with Depolarization-Sensitive Optical Frequency Domain Imaging

Jones, G.; Otsuka, K.; Fujisawa, N.; Yamaura, H.; Matsumoto, K.; Okamoto, A.; Yamaguchi, T.; Shimada, T.; Kagawa, S.; Yamazaki, T.; Akasaka, T.; Bouma, B. E.; Villiger, M.; Fukuda, D.

2026-06-01 cardiovascular medicine 10.64898/2026.05.22.26353889 medRxiv

Top 2%

0.1%

Show abstract

Background: Quantitative lipid assessment is central to identifying rupture-prone coronary plaques and represents a therapeutic target for lipid-lowering therapy. Near-infrared spectroscopy (NIRS)-derived lipid core burden index (LCBI) is well validated and widely used for detecting lipid-rich lesions. Optical frequency domain imaging (OFDI) is increasingly adopted for guiding percutaneous coronary intervention (PCI) due to its high-resolution structural imaging capabilities. Depolarization-sensitive OFDI (depOFDI) provides intrinsic lipid contrast and may enable combined structural and compositional plaque characterization within a single OFDI-based platform. Objective: To define an OFDI-derived lipid metric and evaluate its agreement with NIRS-derived LCBI. Methods: Thirty-three patients underwent both polarization-sensitive OFDI and NIRS-intravascular ultrasound imaging during PCI. After exclusion of 4 datasets, 29 co-registered pullbacks were analyzed. A signal-to-noise-corrected depolarization metric was used to identify lipid-rich regions and generate depOFDI chemograms. maxLCBI4mm value and location, as well as total LCBI, were computed and compared with NIRS. Results: depOFDI demonstrated strong agreement with NIRS, showing high correlation for maxLCBI4mm (r^2 = 0.862) and total LCBI (r^2 = 0.867), along with strong spatial concordance for the location of the maxLCBI4mm (r^2 = 0.900). Bland-Altman analysis of LCBI4mm showed minimal bias (10.7) with 95% limits of agreement of [81.4 to 102.8]. Conclusions: depOFDI enables accurate quantification of lipid burden alongside the high-resolution structural information inherently provided by OFDI. Because depolarization metrics can be derived from polarization-diverse detection available in many commercial OFDI systems, this approach provides a practical pathway toward comprehensive plaque characterization within existing PCI workflows, without the need for additional imaging modalities.

16

Impact of AI-Assisted Mammography Reading on Quality Indicators in the Czech Breast Cancer Screening Programme: A Retrospective Study

Veverkova, L.; Dolezalova, Z.; Marackova, V.; Mathew, E.; Urbankova, M.; Ambrozova, M.; Piskovsky, T.; Ngo, O.; Majek, O.

2026-05-26 oncology 10.64898/2026.05.25.26353869 medRxiv

Top 2%

0.0%

Show abstract

Objectives: The aim of mammographic screening is the early detection of invasive cancers. In the era of artificial intelligence (AI), this tool may improve diagnosis of earlier stages. The purpose of this study was to assess the impact on selected quality indicators retrospectively. Method: The data source was the Breast Cancer Screening Registry using data from one Screening Unit that currently uses AI routinely. The indicators of the cancer detection rate (CDR), further assessment rate (FAR), and recall rate (RR) in the year 2023, when AI was used, and the year 2022, without AI, in women aged 45-69 were compared. The statistical evaluation used the chi-square test and logistic regression adjusting for the effects of age, a woman's risk level, and the screening round at a 5% significance level. Results: In 2022, without AI, 4,034 women aged 45-69 were included, compared with 4,049 women in 2023 when AI was used. This study showed a non-significant increase in CDR from 5.0 breast cancers detected per 1,000 women (non-AI assessment) to 5.2 (AI-assisted assessment), p = 0.919; OR (95% CI): 1.034 (0.542-1.974), a significant decrease in the FAR from 5.2% to 3.9%, p < 0.001; OR (95% CI): 0.665 (0.529-0.836), and a decrease in RR from 2.4% to 1.9%, p = 0.083; OR (95% CI): 0.754 (0.548-1.037). Conclusion: AI has the potential to be a useful tool in the early detection of breast cancer by improving quality through a decrease in FAR and RR, while probably maintaining CDR.

17

Multi-Agent AI for Chest Radiography: A Sequential Segmentation and LLM-Driven Consultative Tool for Medical Training

Kurt, F.; Subasi, A.

2026-06-01 health informatics 10.64898/2026.05.29.26354432 medRxiv

Top 3%

0.0%

Show abstract

Background: Traditional diagnostic models lack explainability, while multimodal language models prone to hallucination remain unsafe for medical education. An interactive, risk-free artificial intelligence framework is required to serve as a reliable clinical mentor for radiology trainees. Methods: We propose a multi-agent architecture decoupling deterministic image analysis from generative consultation. Specialized computer vision models perform anatomical localization and pathological segmentation. These quantitative outputs are synthesized into a structured payload, which grounds a locally hosted large language model (LLaVA 7B) using strict prompt guardrails and prerequisite protocols. Results: The system effectively eliminates visual hallucinations by intercepting unanchored queries. The artificial intelligence tutor successfully contextualizes spatial anomalies and baseline metrics, generating accurate conversational explanations and formally structured radiology reports while strictly enforcing medical safety disclaimers. Discussion and Conclusion: By anchoring language generation exclusively to verified algorithmic realities, this framework transforms opaque diagnostic models into safe, interactive educational simulators. This establishes a highly reliable paradigm for integrating explainable artificial intelligence into medical training.

18

Vaginal Antisepsis for Major Gynecologic Surgeries Using Chlorhexidine Gluconate versus Povidone Iodine: A Systematic Review and Meta-Analysis

Dias, Y.; Gebrekidan, F.; Lowder, J.; Sutcliffe, S.; Yaeger, L.

2026-05-27 obstetrics and gynecology 10.64898/2026.05.26.26353429 medRxiv

Top 3%

0.0%

Show abstract

ABSTRACT OBJECTIVE: We performed a systematic review and meta-analysis (SRMA) of post-surgical outcomes, comparing chlorhexidine gluconate (CHG) versus povidone iodine (PI) for vaginal antisepsis of major gynecologic procedures. DATA SOURCES: Ovid Medline, Embase, Scopus, Embase, Cochrane, and Clinicaltrials.gov were searched between 1986 and December 2023, for studies comparing CHG with PI for vaginal antisepsis of major gynecologic operations. STUDY ELIGIBILITY CRITERIA: We included Randomized Controlled Trials (RCTs) and non-RCTs comparing CHG to PI for vaginal antisepsis of major gynecologic operations. The primary outcome was surgical site infections (SSIs) and the secondary outcome was urinary tract infections (UTIs) and vaginal irritation. METHODS: Summary estimates were calculated by fixed effects models when I2 [≤] 25% and by random effects models when I2 > 25%. Statistical analysis was performed using RevMan 5.4.1. The protocol for this systematic review was registered on PROSPERO (ID CRD42022378101). RESULTS: Nine studies met the inclusion criteria, four of which were randomized controlled trials (RCTs). 9538 patients were included, 4300 (45%) of whom were allocated to CHG and 5238 (55%) to PI. No statistically significant difference in SSI incidence was found for vaginal antisepsis with CHG versus PI in pooled analyses (n= 9538 patients; RR 1.20; 95% CI 0.92-1.57; I2 =0%). In contrast, a significantly higher risk of UTIs was observed for vaginal antisepsis with CHG than with PI (n=6061 patients; RR 1.48 95% CI 1.03-2.14; I2 = 0%). CONCLUSION: In our SRMA, there were no significant differences in SSI risk when either CHG or PI was utilized for antiseptic vaginal preparation. Interestingly, vaginal antisepsis with PI was associated with a lower incidence of post-operative UTIs following major gynecologic surgery. Our findings support current guidelines that form of vaginal antisepsis can be used for SSI prevention. They also suggest that PI may result in fewer postoperative UTIs but further randomized studies are needed to support these findings. Key words: surgical site infection, surgical wound infection, urinary tract infection, urogynecologic surgery, Chlorhexidine, Povidone Iodine, surgical antiseptic,

19

An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv

Top 3%

0.0%

Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [≤] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [≤] 40%. After fine-tuning on less than 10% of external data, LVEF [≤] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

20

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv

Top 3%

0.0%

Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.